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Abstract 

The absolute free energy — or partition function, equivalently — of a molecule can be estimated 
computationally using a suitable reference system. Here, we demonstrate a practical method for 
staging such calculations by growing a molecule based on a series of fragments. Significant computer 
time is saved by pre-calculating fragment configurations and interactions for re-use in a variety of 
molecules. We employ such fragment libraries and interaction tables for amino acids and capping 
groups to estimate free energies for small peptides. Equilibrium ensembles for the molecules are 
generated at no additional computational cost, and are used to check our results by comparison 
to standard dynamics simulation. We explain how our work can be extended to estimate relative 
binding affinities. 
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I. INTRODUCTION 



The use of a reference system for free energy calculations has a long history in physics 
and chemistry- The basic idea is to employ a reference system ("ref") for which the abso- 
lute free energy is available, and which is as similar as possible to the physical system of 
interest ("phys"). Historically, Stoessel and Nowak applied the reference-system strategy to 
a molecular system for the first time, using a solid harmonic reference system in Cartesian 
coordinates.- Zuckerman and Ytreberg extended that work in two ways designed to improve 
overlap between the reference and physical systems?^ (i) by using internal coordinates; and 
(ii) by using a more flexible, numerically exact reference system based on histograms from 
a short dynamics simulation, rather than an artificial analytically tractable reference state. 
Huang and Makarov also employed the reference-system approach embodied in (1), but in 
a different way- 

F phys = ^ref + A F ref ^ phys , (l) 

where F x is the absolute free energy of model "x," and AF ref ^ phys is the free energy difference 
between the systems. In essence, this paper is about practical choices for the both the 
reference system and strategies for calculating AFref-vphys when the physical system is a 
large molecule. 

Historically, Stoessel and Nowak applied the reference-system strategy to a molecular 
system for the first time, using a solid harmonic reference system in Cartesian coordinates.^ 
Zuckerman and Ytreberg extended that work in two ways designed to improve overlap 
between the reference and physical systems^ (i) by using internal coordinates; and (ii) by 
using a more flexible, numerically exact reference system based on histograms from a short 
dynamics simulation, rather than an artificial analytically tractable reference state. Huang 
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and Makarov also employed the reference-system approach embodied in (1), but in a different 
way. 4 

Other efforts to calculate absolute free energies for molecular systems have been ongoing 
for years in the groups of Meirovitch^ 1 ^ and GilsorA^ 1 ^ and more approximately using 
harmonic and quasi-harmonic methods.— The work of Meirovitch builds on long-standing 
polymer-growth methodologies for estimating partition functions which date to the work 
of Rosenbluth and Rosenbluth.— The original Rosenbluth work was generalized for higher 
efficiency and more realistic models by many workers.— J£JIJiU&i^i2L22i2iL2£i2L2£i2Z Ideas from 
these polymer-growth sampling methods also inform the present work. 

The present paper significantly extends the previous work by Ytreberg and Zuckerman^ by 
estimating absolute free energies for molecules built up gradually from molecular fragments. 
Larger molecules can be treated, compared to our previous paper-, because a series of 
staged intermediate systems are adopted. In essence the free energy difference of Eq. (1) is 
sub-divided (staged) into a sum of easy-to-calculate terms. Staging increments are highly 
tunable, based on the choice of fragment sizes and even by selection of subsets of interactions 
as detailed below. The use of fragments in other types of molecular mechanics calculations 
has a long history.— ^.rM 

A key contribution of this work is a practical strategy of pre-calculation which minimizes 
the number of energy terms which need to be computed at each stage. Specifically, for each 
fragment, a statistical library — i.e., an ensemble of configurations and their energies — is 
stored; we have also used such libraries in Monte Carlo sampling^. Additionally, for each 
covalently bonded fragment pair, we store the full interaction energy (based on all atoms) 
for every possible pair of configurations. Such storage is quite practical on typical modern 
computers with > 1 GB of RAM. During production simulations it is only necessary to 
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compute interactions between fragments separated by one or more other fragments. Needless 
to say, the stored libraries and interaction tables can be re-used in future simulations of 
the same or different molecules. The pre-calculation strategy, which has early conceptual 
roots^ 3 - 3 - 1 ^, appears to represent a significant practical advance over earlier polymer-growth 
calculations. The use of non-statistical libraries has been popularized in the Rosetta protein 
folding program. 35 

There is substantial flexibility in the division of a molecule into fragments. We have 
used single amino acids as fragments in this study, but larger segments and even different 
interaction subsets as detailed below - may also be practical. The fragment-based approach 
could also be used to study protein-ligand binding, by growing small molecules into receptor 
binding pockets and estimating the free energies. This can be seen as a statistical mechanical 
generalization of fragment-based ideas developed earlier.—" 3 ^ 

Our results, which employ single amino-acid fragments, are extremely encouraging. The 
data indicate that absolute free energies for small peptides can be calculated rapidly and reli- 
ably. Specifically, high-precision free energy estimates, with fluctuations of ~ 0.3 kcal/mole, 
are obtained for 52-atom tetra-alanine in less than an hour of single-processor computing 
time, with a simple dielectric "solvent". We check our data by comparing the equilib- 
rium ensembles (obtained simultaneously with the free energy estimates) with independent 
Langevin simulations. As a further check, in one case, the free energy results are verified by 
an independent calculation using different fragments. 

The remaining sections of the paper describe the methods, results, and our conclusions. 
Our methods section provides full details for performing the calculations, including the 
generation of fragment libraries and interaction tables. We also correct a technical error 
in our earlier study. 3 Our results describe both the free energy values and the analysis of 
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the equilibrium ensembles. Our discussion section describes possible improvements to the 
method and extension to the estimation of relative binding affinities using absolute free 
energies. 

II. METHODS 

Our basic approach is to calculate the free energy of the physical system of interest 
based on the difference from a known reference system, as in Eq. ([IJ), and also to stage the 
calculation using molecular fragments. The fragments not only permit the gradual staging 
of the calculation but also a tremendous savings of computer time based on the storage of 
(i) fragment configurations, (ii) energies internal to each fragment configuration, and (iii) 
interaction energies between covalently bonded fragments. The low cost and high precision 
of the resulting estimates suggests we are far from the practical limit of the approach in the 
present implementation. However, a number of improvements to the implementation appear 
to be within easy reach, as described in our Discussion. All fragment libraries used in the 
present calculations are available at our website (www.ccbb.pitt.edu/Zuckerman). 

A. Model and systems 

All calculations employ a standard atomistic forcefield, OPLS-AA^ at T = 298K. In the 
present report, our fragments are individual amino acids and capping groups. For simplicity 
in this initial investigation, we model the solvent by a simple uniform dielectric constant 
e = 60. We compute free energy estimates for alanine dipeptide (Ace-Ala-Nme), di-alanine 
(Ace-(Ala) 2 -Nme), and tetra-alanine (Ace-(Ala) 4 -Nme). Following standard conventions, 
Ace is Acetyl (CH 3 -CO), Ala is Alanine (NH-CH(CH 3 )-CO), and Nme is N-methylamide 
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(NH-CH 3 ). 



B. A simple example 



Consider the calculation of the configurational free energy of alanine dipeptide based on 
a division into three fragments (Ace, Ala, Nme) which can be denoted (A, B, C) respectively 
(see Fig.l). In advance, we calculate statistical libraries of configurations for each fragment, 
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Figure 1: 



which are constant-temperature OPLS-AA ensembles based only on the atoms within the 
given fragment. The libraries additionally include the six degrees of freedom necessary for 
joining the fragments, based on the use of "dummy atoms" as described below. During the 
library generation process, the absolute free energy for each fragment is also calculated using 
a reference system as described previously- A typical library will contain 10,000 configu- 
rations. We also pre-calculate every possible interaction energy between covalently bound 
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fragments — i.e., a table of 10 8 interaction energies for the A-B and B-C fragment pairs. 

The calculation proceeds as schematized in Fig.Q], where the presence of a line connecting 
two fragments indicates that all interactions between the fragments is included. The refer- 
ence system (not shown) consists of fully independent coordinates, so that the fragments are 
not yet constructed. The first intermediate consists of the three non-interacting fragments, 
which include, however, all interactions within each fragment. Thus the fragment free ener- 
gies, which are calculated and stored in advance, properly include the interactions among all 
degrees of freedom internal to each fragment. Other interactions are added in three stages: 
A-B interactions first, followed by B-C, and completed by A-C couplings. 

In the first intermediate stage, the absolute free energies for the individual fragments are 
retrieved from disk. (They are initially calculated following reference^ as detailed below.) 
Next, A-B interactions are added by a standard free energy difference calculation. Specifi- 
cally, an ensemble of non-interacting A-B configurations is generated by random combination 
of fragments from the A and B libraries, and the resulting energy change is exponentially 
averaged in the usual way — via Eq. (6) below. The energy changes due to the combination 
do not need to be calculated in our scheme, however, because they have been tabulated in 
advance. Additionally, the now interacting A-B fragments are "resampled"— to correspond 
to the full potential for all degrees of freedom in both fragments. The details of resampling 
are given below — see Eq. (Jj[J) — but the bottom line is that one obtains 10,000-configuration 
ensemble of the partially grown molecule consisting of the A-B fragments. 

The calculations then proceeds as if there are two fragments, A-B and C. The two libraries 
are joined combinatorially but only accounting for the B-C interactions at this stage. The 
A-C interactions will be handled at a later stage. Once again, the free energy change is 
calculated and the ensemble is resampled to reflect B-C interactions. The resulting ensemble 
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contains 10,000 configurations of the full molecule reflecting all interactions except those 
between fragments A and C. 

In the final stage sketched in Fig. [U the A-C interactions are added in a standard free 
energy difference calculation based on the the ensemble of the previous stage. However, a 
standard energy call (for the whole molecule) is not required to save CPU time. Rather, our 
code only computes energy terms specific to the A and C fragments — i.e., electrostatic and 
van der Waals interactions between atoms of A and those of C. Once the energy changes 
have been obtained, the full free energy is rapidly estimated. Resampling into the fully 
interacting ensemble can also be performed rapidly without additional energy calculations. 

It is not difficult to imagine generalizing this example to systems with more fragments. 
It is also worth noting that, strictly speaking, the final stage was not necessary. That is, 
we could have added the A-C interactions simultaneously with the B-C combination since 
the full molecular configurations were constructed at that point. These choices illustrate 
the flexibility intrinsic to staging the calculation with fragments, as we detail further in our 
Discussion. Additional staging flexibility results, of course, from the initial choice of the 
fragments — i.e., smaller fragments lead to staging in finer increments. 

C. Basic formalism 

Our calculation of the absolute free energy F phys for a molecule divided into fragments 
is based on standard, straightforward equations. The only novel aspect of the formalism 
is our particular choice of stages based on the addition of fragments and/or inter- fragment 
interactions. Although our heavy reliance on pre-calculated information has very significant 
practical implications, it does not affect the formalism. 
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Definition of the free energy 

The fundamental object of interest is the absolute classical free energy F phys for an 
implicitly solvated molecule. The molecule is taken to consist of iV atoms, and its internal- 
coordinate configurations are denoted by x. The potential energy function will be a standard 
forcefield (here, OPLS-AA— ), possibly augmented by an implicit solvation model; the full 
potential energy including any solvation will be denoted by ?7 phys (x). The free energy, which 
is a functional of U phys , is defined by the dimensionless configurational partition function at 
temperature T = 1/ksP via 



where the measure of integration dx is understood to include any necessary Jacobians. Ki- 
netic energy terms have already been integrated out. Both the dimensionless character of 
the partition function in Eq. ((2j) and the angstrom-based normalization result from a partic- 
ular choice for the standard concentration C° (defined in references'^) — or equivalently, 
for the volume containing our implicitly solvated molecule. In particular, we have chosen 
C° = 87T 2 (1 h) m ~ 3 Q p /cr, where Q p = Ylf =1 (27r rrii k B T/h 2 ) 3 / 2 results from the momentum 
integrals, h is Planck's constant, and a is the molecule's symmetry number. We note that 
our chosen standard concentration varies based on the molecule (i.e., based on the number 
and masses of its atoms), and also eliminates the temperature dependence of Q p . However, 
in almost every application of interest (see Discussion, below), the absolute free energy cal- 
culated here ultimately will be used to estimate a free energy difference and eliminate any 
artifacts due to C°. 

Our single-molecule formulation, as noted, allows for "implicit solvation" using an effective 
solvent term in U phjs that is solely a function of the internal molecular coordinates x. The 
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present calculations employ a simple uniform dielectric constant (e = 60). In our Discussion, 
we address the minor technical issues involved with using a more realistic implicit solvent 
model. 

One issue of dimensionality is worth emphasizing. Although there are 3N — 6 internal 
coordinates for a molecule consisting of N atoms, the integral of Eq. ((2j) has dimensionality 
of length to the power 3N — 3. This is because N — 1 bond lengths remain in the full set 
of internal coordinates x, each of which contributes three powers of length. Put another 
way, of the six excluded rigid-body/center-of-mass coordinates, the three orientation angles 
are dimensionless; more specifically, the angles integrate to the factor of Sn 2 included in the 
definition of C° 

Staging the free energy calculation 

As illustrated in the example of Sec. Ill Bl we will calculate the free energy in a series of 
stages. These can be understood most easily by adding and subtracting the free energies 
corresponding to k intermediate models, 

= AF k ^ phys + AF k + h AFi + F ref , (4) 

where AFj = Fj — Fj-i and Fj[Uj] is defined in analogy to Eq. (j2J) for the intermediate 
models defined by Uj. The Uj potentials will be specified below. 

All free energy difference calculations will be performed here using the "perturbation" 
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formulation.— Explicitly, for two arbitrary potential energy functions U a and U b , one has 

AF a ^ b = F b [U b ] - F a [U a ] = -k B T\n (exp [-(3 (U b - U a )]) a (5) 

~ -A; B Tln|^^exp[-/?(^(x J )-f/ a (x i ))]| (6) 

where the subscript a denotes an average performed over configurations distributed according 
to the U a ensemble and N a is the number of configurations in that ensemble. Eq. (jSJ) is used 
to estimate the free energy differences required in Eq. (j3J), and it is exact in the limit 
N a -> oo. 

We emphasize that succeeding intermediates are constructed to have progressively nar- 
rower distributions as more interactions are added, as in the alanine dipeptide example. 
In other words, we ensure good overlap and reliable AF estimates by proceeding in the 
generalized "insertion" direction.— 1 ^ 

Resampling to obtain staged equilibrium ensembles 

As our calculation proceeds through the various stages, we will require the correspoinding 
equilibrium ensembles for each stage, primarily for use in Eq. (J6j) . These are obtained 
by "resampling," the process of converting an ensemble for one distribution into another 
by eliminating, duplicating, and/or adjusting the weights of configurations in the original 
distribution.— In our case, we primarily use elimination of configurations from a larger 
ensemble (e.g., all combinations of fragments A and B) to create a smaller one (e.g., the 
interacting A-B ensemble); we do not adjust weights. More specificially, to resample an 
ensemble of configurations x a generated according to U a into a U b ensemble, the original 
configurations are randomly selected with probability proportional to the ratio of Boltzmann 
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factors, 



-P[U b (Xa)-U a {Xa)] 



(7) 



Operationally, we select configurations by forming a cumulative distribution function (cdf) 
based on the normalized set of ratios (jlj), and then choosing from this cdf as many times as 
desired. 
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Figure 2: 



D. Choice of intermediate models 

As already noted, the set of intermediate models {Uj} can be chosen in a variety of ways. 
In the present study, we employ k intermediates for a molecule divided into k fragments. 
This was exemplified for alanine dipeptide, which is divided into three fragments. 
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The present study uses a uniform staging strategy for all molecules examined, as exem- 
plified in Figs. Q] and El The reference system consists of all coordinates fully independent 
— both within and between fragments — as in our previous work.- The first intermedi- 
ate stage adds interactions within fragments, so that one has true molecular fragments but 
no interactions between fragments. We then add interactions between neighboring, cova- 
lently bound fragments — i.e., among all the atoms in the neighboring fragment pair — 
one fragment pair at a time. The final stage of this simple scheme involves the addition of 
all remaining interactions, which occur solely between non-adjacent fragments. The result 
is a molecule with atoms interacting fully according to a standard forcefield and possibly 
continuum solvent model. 

Because interactions among previously non-interacting components are added at every 
stage, it is expected that the configuration space will become increasingly narrow. Such a 
progressive narrowing justifies the use of the perturbation expression ([6j). 

To explicitly illustrate the staging scheme employed here, consider the case of a molecule 
divided into the three fragments A, B, and C, as in Fig. [H We denote by w- ef the reference 
potential for internal coordinate Xi, where the full set is x = (x\, X2, ■ ■ ■)■ For the fragments, 
we let U y be the potential energy for all interactions internal to fragment y, and U yz is 
the potential energy for all interactions between the y and z fragments. A three-fragment 
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molecule would be staged as follows: 

3AT-6 

jjref = J- U f{ Xi ) 
t=l 

U 1 = U A + U B + U c 

u 2 = u 1 + u AB (8) 

U 3 = U 2 + U BC 

The choice of the reference potentials {u\ ei } is guided by the forcefield, as detailed below 
in Sec. HTGl 

A four-fragment molecule, such as di-alanine (Ace-(Ala)2-Nme) schematized in Fig. [2, 
would be staged according to: 

3N-6 

U Tcf = u Tei {xi) 

i=l 

U 1 = U A + U B + U C + U D 

U 2 = U X + U AB ^ 
U 3 = U 2 + U BC 
U 4 = U 3 + U C d 
U^ s = U 4 + U AC + U BD + U AD 
As described in the Discussion, it is also possible to stage the final ("non-bonded") pairwise 
interactions separately. 

We anticipate that significant optimization can be obtained by adjusting fragmentation 
and staging schemes. While our Discussion, below, describes more gradual staging strategies, 
the present initial report is limited to the single staging strategy given above. 
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E. The non-interacting reference system 

The computation of the (absolute) reference free energy F ref is perhaps the most techni- 
cally involved step of the calculation. The remaining free energy differences in the decompo- 
sition of F phys in Eq. (Jlj) are estimated using a simple, standard method. For the reference 
free energy, however, great care must be taken with the normalization and Jacobian factors 
of the chosen probability distributions. Indeed, our previous report^ includes an error in 
this regard, as explained at the end of this subsection. 

As described in our discussion of staging (Sec. Ill Dl) . the reference system for all molecules 
studied here consists of the set of non-interacting internal coordinates. The reference poten- 
tial energy function will be constructed, following our previous work-, so that the reference 
partition function is normalized to one. That is, we will construct our reference model U rcf 



where the same standard concentration as in Eq. ((2j) has been used implicitly. (From this 
point forward, we will omit writing the length units, but they should be implicitly associated 
with every bond-length integration.) The motivation for the unit normalization of Z rei is 
that application of a logarithm leads to the simplifying value, 



for every system. 

While there are many ways to construct U ref to satisfy the required normalization of Eq. 
( fTOl) . we use the strategy of employing independent internal coordinates as in our earlier 
work." As usual, the full set of 3A — 6 internal coordinates, x = (x\, 22, • • • , ^37v-6) consists 
of A — 1 bond lengths, A — 2 bond angles, and A — 3 dihedrals. So long as the distribution 



so that 




(10) 



0. 



(11) 



15 



of each individual coordinate is normalized when integrated with the appropriate Jacobian 
factor J, the full distribution will be normalized. 

Because total reference energy is given by a simple sum of independent terms, 

37V-6 

^ ef (x) = (12) 

i=l 

the desired normalization (fTUl) is ensured by enforcing 

J dxi J{xi)e- pu ^ = 1 , (13) 

where the inclusion of inverse length units is understood for bond-angle integrals. In words, 
then, each individual potential u™ 1 must include suitable normalization — which is accom- 
plished by offsetting the potential by the log of the integrated (un-normalized) Boltzmann 
factor. See reference^ for further information. 

(As detailed in Sec. Ill Gl peptide and ip angles were, in fact, sampled together from a 
single distribution based on a pairwise energy function w^. These angles are independent 
from all other coordinates, however. We emphasize that this exception does not alter the 
basic formalism, which has been simplified very slightly for clarity.) 

It is very useful to observe that normalization of the coordinate distributions via Eq. 
( 1T31) can be achieved either using standard analytic forms — e.g., Gaussians — or via 
numerical histogramming procedures.- Thus, there is great flexibility in the choice reference 
distributions embodied in the reference potentials. In addition to forcefield terms, prior 
knowledge, such as from a simulation, can be used in constructing the set {wj ef }. The 
reference potentials chosen for the present study are described below in Sec. Ill Gl on library 
construction. 

One word of warning is appropriate here. Although it is possible to describe the internal 
configuration of a molecule using additional bond angles to substitute for dihedrals in some 
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cases, the Jacobian for such a description appears not to be well-defined. Therefore, it is 
necessary to use the standard description with N — 2 bond angles and N — 3 dihedrals. 
Unfortunately, we were unaware of this point during our original study,- and therefore an 
erratum will be prepared correcting the resulting numerical errors. 

F. First intermediate: non-interacting fragments 

The first intermediate stage adds only localized interactions to the non-interacting ref- 
erence model, as illustrated in Figs. Q] and El Specifically, once a molecule is divided into 
fragments (A, B, C, ...), the first intermediate includes only interactions occuring within frag- 
ments. The fragments exactly divide all coordinates so that we can write x = (x^,x#, . . .) 
and the potential energy function for this stage is given by 

U 1 {x) = U A (x A ) + U B (y: B ) + --- , (14) 

where U y includes all interactions from the full forcefield (OPLS-AA, in our case) among 
the fragment coordinates x y for y = A,B,.... Importantly, the fragment potential U y 
includes all non-bonded interactions — electrostatic, van der Waals — among the atoms 
of the fragment. (Sec. Ill Gl on our libraries describes the treatment of connecting "dummy" 
atoms.) 

The free energy for this stage — i.e., Fi[C/i] for use in the key equation recalling 
F ref = — can be calculated by using the standard perturbation relation (J6j). For such 
a computation, one would use U a = U rei and Ub = U\ along with an ensemble distributed 
according to the Boltzmann factor of t/ ref . 

In practice, once the libraries are generated, no calculation of energies needs to be done. 
As detailed below the libraries are generated (just once, for repeated use in many systems) 
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based on the U distribution. Thus, during the library generation process, it is a trivial 
matter to calculate the absolute free energy for each fragment using Eq. ([6]). Thus, individual 
fragment free energies F y are calculated in advance that exactly sum to the desired first-stage 
free energy: 

F 1 = F A + F B + ■ ■ ■ . (15) 

Further, the independent-coordinate distributions are subsequently resampled based on Eq. 
(jlj) to generate the library distributions — i.e., ensembles for the U y Boltzmann factors — 
for use in subsequent stages. 

G. Construction of fragment libraries 

As just described, fragment libraries are critical to the calculation of the free energy of 
the first intermediate stage, Fx. The libraries also greatly facilitate computations for the 
rest of the intermediates. 

In general terms, fragment configurations are generated by sampling internal coordinates 
according to the independent probability distributions which constitute the reference system. 
The generated configurations are then used to calculate fragment free energies, F y for y = 
A,B,.... The configurations are also reweighted into an ensemble distributed according to 
the full forcefield for Xj,, the degrees of freedom internal to fragment y. Typically, such a 
procedure can be applied only to systems with a sufficiently small number of degrees of 
freedom. For large systems with enough correlated degrees of freedom, there tends to be 
insufficient overlap with the reference system of independent coordinates. That is, only a 
tiny fraction of the reference-distributed configurations will be important in the interacting 
ensemble. Therefore, the choice of the generating probability is essential for the efficient 
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generation of libraries. 

We found that for fragments the size of alanine residues, rather simple probability func- 
tions were sufficient for generating tens of thousands of (statistically independent) config- 
urations in weeks of single-CPU time. This is a negligible cost because once a library is 
generated, it can be used in multiple simulations. 

Different coordinate types are best sampled with different distributions, as is suggested by 
the forcefield terms. Regardless of the particular choice, the specification of the distribution 
immediately implies the functional form for the reference potential uj ei from Eq. ( fT3l ). We 
found that simple Gaussian distributions, with parameters extracted from a short Langevin 
simulation, worked well for bond lengths and bond angles. For "stiff" dihedrals, such as those 
in relatively planar groups (e.g., peptide bond), a Gaussian is also appropriate. For "soft," 
rotatable dihedrals — such as <fi, ip and \ angles in amino acids — we simply extracted 
histograms from a Langevin simulation of alanine dipeptide, as described in reference^. A 
two-dimensional (correlated) probability function was used for the (0, ip) dihedral pair, but 
a one-dimensional distribution was used for all other dihedrals. 

Based on the distributions just described, internal coordinates were sampled indepen- 
dently (except for pairwise sampling of and ip dihedrals) using an in-house program written 
in C. Generated configurations were saved to disk and converted to Cartesian coordinates. 
The corresponding forcefield energies for each configuration were calculated using the "ana- 
lyze" module of the Tinker software package. 41 Based on these values and the known reference 
energies, the individual fragment free energies were calculated using Eq. (0). A simple re- 
sampling procedure^ was used to generate a fragment ensemble distributed according to the 
forcefield; see Eq. (13). Only a small fraction (jt 10~ 4 ) of reference-ensemble configurations 
remain after resampling, requiring extensive sampling of the reference ensemble and weeks 
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of CPU cost, as mentioned earlier. 

For this study, we generated libraries consisting of 10,000 configurations. All fragment 
libraries were sampled according to OPLS-AA forcefield at T = 298K, with a simple dielec- 
tric constant (e = 60) modeling solvent. The choice of dielectric constant was motivated by 
the reasonable behavior observed in separate Langevin simulations of poly-alanine systems 
(data not shown). 

As noted earlier, for all possible 10 8 covalent (neighboring) pairings of fragments, we 
also tabulated the interaction energies from the forcefield, accounting for all atoms in the 
fragment pair. Suitable corrections for dummy atoms (see below) were made. In other 
words, for a simple two-fragment system, all interactions are stored. 

Use of dummy atoms 

Because fragments are sampled independently from each other, the six degrees of free- 
dom that specify the relative orientation of neighboring fragments are included with the 
fragments. For this purpose "dummy" atoms are used to provide the extra coordinates. We 
stress that our use of dummy atoms was implemented carefully to avoid adding additional 
degrees of freedom (e.g., certain bond lengths and angles). We chose to have the dummy 
atoms interact with the true fragment atoms for better overlap with subsequent ensembles. 
Thus, when the fragments are joined, the interaction energies of dummy atoms should be 
subtracted from the full fragment energy because dummy atoms are replaced with neigh- 
boring fragment atoms. (Of course, it is simpler to have non-interacting dummy atoms.) 

The dummy atoms used at the N-terminus of a fragment are carbonyl C, carbonyl O and 
terminal alpha-C with valence set to one. The dummy atoms used at the C-terminus are 
amide N, amide H, and terminal alpha-C with valance also set to one. The dummy atoms 
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were assigned the same forcefield parameters as used in the corresponding fragment atoms. 



H. The second and subsequent intermediates: adding neighboring fragment inter- 
actions 

Returning again to the scheme embodied in Eq. pj, as well as in Figs. [Hand EJ the next 
intermediates add interactions between neighboring fragments. These can be considered 
the "bonded" interactions in the space of fragments, but non-bonded interactions among 
all atoms in the neighboring pair are included. Explicitly, the models for the remaining 
intermediates are described by 



where U yz is the full interaction energy — based on the forcefield and solvent model — 
between fragments y and z. 

Formally, it is clear what needs to be done. The ensemble of the previous stage j — 1 
should be used to calculate AFj using the perturbation relation ([6|) with Uj-\ and Uj. 

Again, however, possession of the libraries and interaction tables leads to dramatic prac- 
tical implications. For instance, by construction, the energy [7 2 — U\ is simply the pre-stored 
energy Uab', similarly U3 — U2 = Use- These tabulated energies are used directly in Eq. 
(jSJ) without the need for additional energy calls. The required ensembles for each stage are 
generated by the rapid resampling procedure of Eq. (£j[J) . In this way, one readily generates 
the free energy differences AF 2 , AF 3 , ... required for the evaluation of F phys via Eq. (j4j). 

Caution is required when the molecule of interest contains repeated fragment pairs. While 
the same libraries can be used for the repeats, say at intermediate stages j and m, the 



C/ 2 (x) = U a (xa) + U B (x B ) + ■■■+ U AB (x A , x B ) 



(16) 



C/ 3 (x) = C/ 2 (x) + ?7 bc (xb,x c ) + -- - , 



(17) 
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corresponding values of AFj and AF m will be different in general. To see the reason, consider 
the case of the tetra-alanine peptide studied below. The term AF 2 corresponds to including 
the interaction of the already combined fragments Ace-Ala with the next Ala. Note that 
the free energy difference AF 2 is calculated via Eq. using the Ace-Ala ensemble as the 
"a" system. By contrast, consider the calculation of AF 3 for the addition of the next Ala 
- now to the Ace-(Ala) 2 ensemble. Although the free energy change will be based upon 
the identical (tabulated) interactions, the associated Boltzmann factors in Eq. j6j) will be 
weighted differently — i.e., occur with different frequencies — due to the differing initial "a" 
ensembles. In turn, this will lead to different free energy changes, so that AF 3 7^ AF 2 . 

I. The final free energy difference: non-neighboring interactions 

As described in the master scheme of Eq. (TjJ and illustrated in Figs. Q] and [21 the final 
calculation needed to obtain F phys entails the inclusion of all remaining interactions in the 
forcefield and solvent model. These interactions, excluded until now, occur between atoms 
in non-neighboring pairs. As described in Sec. II D, for a molecule of k fragments, the full 
physical potential energy function (i.e., the forcefield) can be written as the difference from 
the final (kth) intermediate: 

^Phys (x) = jj^ + U yz (* y , X 2 ) , (18) 
y..z 

where the sum is over non-neighboring pairs of fragments — i.e., AC, AD, BD, .... 

In this case, the necessary energy terms for use in the calculation of AFfc^ p h ys via Eq. 
(El) must be calculated. They cannot readily be stored in advance, due to the combinatorial 
explosion of possible configurations. For instance, with libraries of 10 4 configurations, there 
are 10 12 possible configurations for three fragments, which is beyond the range of current 
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commerical machines. 

J. Generating an equilibrium ensemble without additional energy calls 

The physical ensemble, distributed according to the Boltzmann factor of the forcefield, 
can be generated by resampling the U). ensemble — the last intermediate — using Eq. 
In this case, the "a" ensemble corresponds to U/~ and the "b" ensemble to the full forcefield 
and (implicit) solvent model. Because all energy terms have already been calculated, no 
additional energy calls need to be made. The necessary resampling computation is extremely 
fast compared with preceding stages of the protocol. 

K. Checking the code and estimating uncertainty 

Although the formalism governing the present study is mostly straightforward, our in- 
house computer program not only needs to reproduce standard forcefield results, but also 
requires complicated "dissections" of various subsets of forcefield terms. We therefore per- 
formed three types of checks on our code, (i) We checked that the forcefield energy for 
full molecular configurations exactly reproduces the results reported in Tinker (data not 
shown). This verifies that we have correctly accounted for our dummy-atom energy terms, 
(ii) Using our previously developed "structural histograms" for analyzing configuration-space 
distributions,— ^2 we checked that the equilibrium ensembles produced during our free en- 
ergy calculations agree with independent Langevin simulations. This data is shown in the 
Results section, and generated as explained below, (iii) Finally, we performed a check to 
ensure that our final free energy values are independent of the choice of fragments. This 
data, for two- and three-fragment decompositions of alanine dipeptides is also shown in the 



23 



Results section. 

Statistical error 

Statistical uncertainties were calculated by running 20 independent computations for 
every free energy value reported. Twice the standard deviation among these 20 values is 
reported, which quantifies the scale of expected statistical error for a single simulation. The 
repeated simulations were run using 20 independent sets of libraries for the various fragments 
— i.e., the calculation was started all the way at the beginning in each repeat. However, 
because the overlap between various stages is the limiting factor in the quality of the free 
energy results, rather than our fairly large libraries, we anticipate similar error estimates 
would be obtained for one set of libraries. 

Analyzing equilibrium ensembles/distributions 

In two previous studies,— 1 ^ we have developed methods for comparing equilibrium dis- 
tributions for molecular systems of arbitrary complexity. The central idea is to employ a 
"structural histogram" which simply classifies (divides) configuration space into a number 
of "bins" (regions) . Two correct simulations should yield the same results for the fractional 
populations of the bins, within statistical error, regardless of whether the bins correspond to 
physical states/free energy basins. (Furthermore, the statistical uncertainty in the popula- 
tion estimates can be used to quantify the "effective sample size".)^ In the present work, we 
compare equilibrium distributions from fragment combination and from standard Langevin 
simulations based on structural histograms. The particular histograms employed in the 
present study have five bins derived from a Voronoi construction;— the reference structures 
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for the Voronoi procedure are derived from the equi-probability scheme described in refer- 
ence.— Although the resulting bins are not exactly equally probable, each is guaranteed to 
represent a contiguous region in configuration space due to the Voronoi construction. 

III. RESULTS 

The absolute configurational free energy F phys was calculated for the monomer, dimer, 
and tetramer alanine peptides: alanine dipeptide (Ace-Ala-Nme), di-alanine (Ace-(Ala) 2 - 
Nme), and tetra-alanine (Ace-(Ala) 4 -Nme). For alanine dipeptide, the free energy was es- 
timated based on two different fragment sets as a check on our code. Twenty independent 
calculations for every F phys estimate were performed to quantify uncertainty, as described 
above. Additionally, every free energy calculation also yields an equilibrium ensemble, which 
is compared to independent Langevin simulations. 

The results are very positive in every regard, and rather rapid as reported at the end of 
the Results section. The amount of memory used, which is a key to the present calculations, 
is also reported. 

The results reflect the uniform protocol adopted here. First, absolute free energies for non- 
interacting fragments are calculated. Then free energy changes resulting from interactions 
among covalently bound fragments are added ("bonded" terms, in the space of fragments), 
one at a time in sequence. Finally, all remaining interactions are added, which account to 
("non-bonded") interactions among all non-sequential fragment atoms. The final free energy 
values reflect the full OPLS-AA forcefield^ as implemented in Tinker.— 
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A. Alanine dipeptide using two different fragmentations 

Because of the complexity of the fragmentation procedure and the lack of reference stan- 
dards for absolute free energy values, we wanted to ensure our code was introducing no 
artifacts. We were particularly concerned about the interacting dummy atoms which intro- 
duce "temporary" energy terms, that must be corrected for properly at every combination 
stage. We find excellent agreement between free energy estimates based on two- and three- 
fragment decompositions. 

"Standard" three-fragment decomposition 

In our standard decomposition for the present study, we separate peptide and amino 
acid groups. For alanine dipeptide (AD), then, the three standard fragments are Ace, Ala, 
and Nme, and the corresponding stages for the free energy calculation are given in Eq. (jHJ). 
Recalling our convention that F Tei = 0, the free energy terms from Eq. (j4j) can be written 

as 

F ref = 

AFi = i<Ace + -^Ala + -^Nrne 

AF 2 = AF Acc ^ Ala (19) 

AF 3 = AF A la^Nmc 
A^3_>phys A-F^onbonded ■ 

where F y is the absolute free energy (including dummy atoms) for fragment y and AF x ^ y 
indicates the free energy change of combining fragments x and y (which includes all bonded 
and non-bonded terms, as well as the correction of dummy terms). Finally, AF nonbonded 
denotes the free energy change in going from an ensemble where sequentially separated frag- 
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ments do not interact to a fully interacting ensemble (in this case, the Ace-Nme interactions 
are added). 

Two-fragment decomposition 

As an alternative decomposition, we used Ace-Ala as one fragment and Nme as the 
other. Importantly, the Ace-Ala library and absolute free energy were not generated from 
a combination of the two smaller libraries, but instead from a ground-up calculation based 
on independent coordinates as described in the Methods section. 

In this case, then, the free energy terms from Eq. ([4]) become 

F ref = 

AFi = -FAco-Ala + -^Nme (^0) 
AFi^phys = AFAce-Ala^Nme , 

where it is notable that in the two-fragment case, all interactions are included in the libraries 
and interaction tables. In other words, no energy calls at all are needed. 

Comparison of free energies 

There is essentially perfect agreement between free energies estimate via the two inde- 
pendent decompositions, which provides a reassuring check on our computer program. The 
full results are given in Table fl] Notably, the two-fragment decomposition has a higher 
variance, which probably results from a decreased "precision" in the pre-generated Ace- Ala 
ensemble. In the composite pre-generated Ace- Ala ensemble, the whole configuration space 
is represented by 10 4 configurations, whereas when Ace and Ala from separate 10 4 -member 
libraries are combined, there is a much denser coverage of configuration space. 
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Equilibrium ensemble compared to standard simulation 

Our free energy computation produces an equilibrium ensemble through repeated resam- 
pling procedures at each stage, as explained in the Methods section (Sec. Ill Jh . As a further 
check on our data, we compare the equilibrium ensembles generated from our fragment com- 
bination procedure to those produced by long Langevin simulations performed in Tinker.— 
The results, shown in Fig. [3](a), indicate that our computation is indeed producing correct 
equilibrium ensembles. The graph shows the populations of different regions of configuration 
space, which was divided up using a Voronoi procedure explained above (Sec. Ill Kl) . The 
alanine dipeptide equilibrium distribution was generated from the three-fragment protocol, 
and the 1 /zsec. Langevin simulation (20*50 nsec) was performed in Tinker using a friction 
constant of 10.0 ps' 1 at T = 298K. 

B. Di-alanine 

Using the same libraries as for the alanine monomer above, we now calculate the absolute 
configurational free energy for the di-alanine peptide (Ace-(Ala) 2 -Nme). The staging used 
is described in Eq. (J9j) , which corresponds to the following free energy terms for use in Eq. 
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F ref = 



AF, = F Ace + 2 • F Ala + Fj 



Nme 



AFo = AF, 



Ace^Ala 



AF, = AF, 



Ala^Ala 



(21) 



AF 4 = AF, 



Ala-+Nme 



A Fa 



AFr,, 



4^phys '-^-'nonbonded • 

It is important to note that AFAia^Aia for di-alanine differs in principle from the term with 



the same name in Eq. ( fl9l) for alanine-dipeptide because the prior ensemble is different. 
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This point was explained in Sec. Ill Hi However, in practice, the difference among the three 
systems is not statistically significant. 

The free energy values are once again calculated with high precision: fluctuations are a 
fraction of one kcal/mole. The data for all free energy terms is given Table HT1 where we see 
a significant change in the AF non b on( }ed term, reflecting the increased number of attractive 
interactions in this larger molecule (compared to alanine dipeptide). 

Similarly, the agreement among bin populations for di-alanine in Fig. [3](b) is excellent, 
which provides an independent reason for having confidence in the free energy results. The 
Langevin simulation for di-alanine was performed with exactly the same parameters as for 
alanine-dipeptide. 

C. Tetra-alanine 

Our results are of high precision (~ 0.1 kcal/mole standard deviation) even for tetra- 
alanine. The staging follows our standard procedure, with the only subtlety in the present 
case is that the addition of every Ala residue is different, because the "growing" ensemble is 
different in every case. Thus we consider the first alanine (Alal) separate from the second 
(Ala2), and so on. 
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F ret = 

AFi = FAce + 4 • FAla + -^Nme 
AF 2 = AF Ace ^Alal 
AF 3 = AF A lal^Ala2 

(22) 

AF 4 = Ai*Ala2-+Ala3 
AF 5 = AFAla3^Ala4 
AF 6 = Ai<Ala4^Nme 
A-Fg—^pJjys AF^o^Qnfjej . 

Our data for each of these terms is given in Table [III Although the different alanine 
additions are based on different ensembles, the results show they are statistically indistin- 
guishable in this case. However, the "non-bonded" term AF nonbonded again is significantly 
different from the previous molecules, as expected. 

In comparing the equilibrium distributions from fragment combination and Langevin 
simulation, once again there is good statistical agreement. For Langevin simulation of tetra- 
alanine, all parameters were set as before, except for a friction constant of 5.0 ps' 1 , which 
does not alter the equilibrium distribution. The contrast between the large fluctuations 
in the bin populations pi and the high precision of F phys in Table II reveals an important 
lesson: sampling is harder than free energy calculation. It is also noteworthy that we 
have already achieved significant efficiency improvements for fragment-based equilibrium 
sampling, beyond what is reported here, which will be reported in future work.— 
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D. Timing and memory usage 

The calculations were reasonably inexpensive, taking 20 minutes for alanine dipeptide, 30 
minutes for di-alanine, and 50 minutes for tetra-alanine using one processor of an Intel Xeon 
3.20 GHz machine. Concerning memory, a single library containing 10,000 configurations 
requires 11 MB for Ala, 12 MB for Ace-Ala complex and 5.7 MB for Ace and Nme. An 
interaction table containing 10 8 pair- wise interactions uses 1.3 GB. 

IV. DISCUSSION 

A. The overall strategy and results 

Overall, the precision of our free energy estimates was very high, which we attribute 
to two related factors. First, our ensembles in the reference and intermediate stages were 
of good statistical quality — i.e., characterized by a large effective sample size (data not 
shown).— Second, there was good overlap between the stages, which indeed contributed to 
maintaining the sample size throughout the stages. The overlap is present by design, as in- 
teractions were always added between stages. The addition of interactions or, equivalently, 
correlations among degrees of freedom is guaranteed to reduce the entropy— This progres- 
sive narrowing of configuration space is consistent with Kofke's proposal to calculate free 
energy differences in the "insertion" direction.— 1 ^ For larger systems, however, one expects 
limitations to maintaining the sample size using the present protocol, as explained below. 
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B. Application of fragment combination for estimating relative protein-ligand 
affinities 

Because the fragment combination procedure can be applied to fragments of small 
molecules, and not just to peptides as in the present report, the approach can be applied 
to calculate approximate relative affinities. That is, one can grow a ligand into the binding 
pocket of a protein receptor and calculate its free energy. A number of different approxima- 
tions can be imagined. Most simply, the receptor can be held rigid and the ligand grown in 
the fields (van der Waals and electrostatic) of the receptor. In a better approximation, the 
binding-site side-chains can be grown along with the ligand. One can expect affinities based 
on such free energy calculations to be superior to their docking counterparts because en- 
tropy is included. To produce a relative affinity estimate between two ligands, the respective 
solvation terms would need to be included as usual.— 

C. Efficiency of fragment combination for equilibrium sampling 

As we have already noted, the fragment combination protocol we have described produces 
equilibrium ensembles simultaneously with free energy estimates. It is natural to wonder 
whether such ensembles are produced more efficiently than by standard dynamics simulation 
especially given that small peptides have been shown to have multi-nanosecond relaxation 
times.— In fact, as we will carefully document in a separate publication,— fragment combina- 
tion can lead to sampling that is faster by several orders of magnitude. However, somewhat 
more sophisticated resampling schemes and different fragment sizes are useful in reaching 
the highest levels of efficiency, as will be reported. 
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D. Use of implicit solvent models 

It is interesting and important to consider the additional costs which would be entailed 
by using a standard implicit solvent model, such as GBSA. — First, both the libraries and the 
interaction tables would need to be regenerated using the implicit solvent model. Although 
this could take several weeks of single-CPU time, it needs to be done only a single time 
for a given model. The second cost is for additional solvent calculations not included in 
the libraries and tables. We hope to report on the staging and computational expense in a 
forthcoming publication. 

E. Relaxation simulations for large systems 

In the protocol employed for this study, the equilibrium ensemble generated at one stage, 
say j, is used to calculate the incremental free energy difference to the next stage, j ' + 1. To 
continue the process, the ensemble at stage j + 1 is produced by resampling ensemble j as 
described in the Methods section. However, it is possible the resampled ensemble will con- 
tain a small number of distinct configurations in an important part of configuration space. 
Such configurations will have high weight prior to resampling and thus the problem can be 
diagnosed by noting whether any configurations are resampled multiple times. Clearly, du- 
plicated configurations will not be statistically independent and lead to increased statistical 
error in free energy estimates. 

One solution to this problem would be to relax duplicated configurations — i.e. to perform 
short equilibrium simulations to create distinct configurations. The statistical justification 
of such an approach is somewhat technical and will be described in future work as required. 
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F. Alternative staging using partial interactions 

Additional incremental stages can be added by considering only subsets of interactions. 
For instance, in the case of di-alanine (Ace-(Ala)2-Nme) which is composed of four fragments 
(A, B, C, D), there are three sets of non-bonded interactions: AC, AD, and BD. Our present 
implementation adds all three in a single stage, but they could be added one at a time. 
Undoubtedly, in larger systems, such finer staging will be necessary and probably should be 
required by relaxation of duplicated configurations as just described. 

V. SUMMARY AND CONCLUSIONS 

We have extended our earlier work on computation of absolute free energies for molecular 
systems - in two ways: (i) by staging the calculation using molecular fragments; and (ii) by 
pre-calculating "statistical libraries" of fragment configurations, intra-fragment energies, and 
inter-fragment interactions. For a series of test systems — the alanine monomer, dimer, and 
tetramer — we were able to compute extremely precise free energies, with fluctuations <C 
1 kcal/mole. The calculations were quite fast, furthermore, with the slowest requiring less 
than an hour of single-processor computer time. The speed results from employing (infinitely 
re-usable) pre-calculated library configurations and interactions, which pre-empt expensive 
energy calculations. Our statistical libraries of amino-acid and capping-group fragments are 
available on our website (www.ccbb.pitt.edu/Zuckerman). 

A future application of potential importance in computational biochemistry is the esti- 
mation of binding affinities of small molecules to proteins. We hope to develop fragment 
libraries suitable for small molecules — following ideas developed long ago^ 1 ^ — to be 
used in computing free energies within a potentially flexible protein binding site. Protein 
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flexibility could be included using the side-chains of the binding site as fragments in the 
calculations. 

Another application closely related to the present report is the use of library-based frag- 
ment combination for equilibrium sampling. While the calculations reported here already 
yield equilibrium ensembles, we are actively studying more efficient schemes based on ad- 
vanced resampling approaches 3 - 7 - 1 ^ and a range of fragment sizes. This work will be reported 
in a separate publication.— 
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Figure captions: 

Figure 1: Stages for calculating the absolute free energy of a molecule by combining 
three fragments, based on Eq. (jHj). Connecting lines schematize full interactions between 
fragments, including both bonded and non-bonded atomistic terms, (a) The first inter- 
mediate stage comprises non-interacting fragments, but includes all interactions internal 
to each fragment, (b) The second stage adds interactions among the atoms of fragments 
A and B, while (c) the third stage does the same for fragments B and C. (d) In the final 
stage, representing the desired free energy F phys , all interactions are added, including among 
non-sequential fragments and possibly including an implicit solvent model. 

Figure 2: Stages used in the free energy calculation of a four-fragment molecule, cor- 
responding to Eq. ((9)). The initial stages proceed in analogy to Fig. [Q with pair- wise 
interactions added one at a time for neighboring ("bonded") fragments. In the final stage, 
all remaining interactions are added. Other, more incremental staging schemes are possible, 
but were not necessary in the present study. 

Figure 3: Comparison of equilibrium distributions from fragment combination and 
Langevin simulation. The graphs show the fractional population in different regions of 
configuration space, as described in Sec. II K. Three peptides are considered: (a) alanine 
dipeptide, (b) di-alanine, and (c) tetra-alanine. The error bars for both the fragment com- 
bination and Langevin results reflect twice the standard deviations among 20 independent 
simulations, roughly a 95% confidence interval. Each Langevin simulation was 50 nsec long. 
The statistical agreement is good in every case. 
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Alanine dipeptide free energy terms from Eqs. (jT9j) and ([20]) 



Three Fragments 


Two Fraj 


mients 


Term 


i_ * i_ n i / n 

Estimate [kcal/molj 


Term Estimate [kcal/mol] 


-pAce 


14.783(0.003) 


^Ace-Ala 


47.311(0.027) 


^Ala 


33.326(0.015) 


-PlSfme 


16.574(0.003) 


-^Nme 


16.574(0.003) 


Ai*Ace— Ala— »Nme 


-0.792(0.002) 


^■^Ace-vAla 


-0.801(0.002) 






AFAla-^Nmc 


-0.499(0.007) 






^^nonbonded 


-0.285(0.008) 








63.098(0.015) 


^phys 


63.093(0.028) 



Table I: Comparison between the absolute free energy for alanine dipeptide estimate using two 
different fragmentation schemes. Our "standard" three-fragment decomposition (Ace, Ala, Nme) 
is compared to a two-fragment grouping (Ace- Ala, Nme). The table gives free energy values in 
kcal/mole, as well as two standard deviations (in parentheses) based on 20 independent calculations. 
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Free energy terms for di-alanine from Eq. (|21|) and for tetra-alanine from Eq. (I22p 



Di-alanine 


Tetra-alanine 


Term 


Estimate [kcal/molj 


Term 


Try i_* i_ n 1 / n 

Estimate [kcal/molj 


^Ace 


14.783(0.003) 


Facc 


14.783(0.003) 




33.326(0.015) 




33.326(0.015) 


F^me 


16.574(0.003) 


F^me 


16.574(0.003) 


A 7T 1 

^-^Ace^Ala 


-0.801(0.002) 


A Z7 1 

^^Ace^Alal 


-0.801(0.002) 


AFAla^Ala 


-0.771(0.014) 


AFAlal^Ala2 


-0.774(0.013) 


A-^Ala^Nme 


-0.499(0.013) 


A-^Ala2-»Ala3 


-0.774(0.012) 


^ -^nonbonded 


-0.809(0.031) 


A.i*Ala3->Ala4 


-0.771(0.014) 






AFAla4^Nme 


-0.498(0.009) 






^-^nonbonded 


-1.986(0.284) 




95.128(0.057) 


^?phys 


159.057(0.293) 



Table II: Free energy terms used in calculating the absolute free energy for di-alanine and tetra- 
alanine. The table gives free energy values in kcal/mole, as well as two standard deviations (in 
parentheses) based on 20 independent calculations. 
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